33 research outputs found

    Acoustic impacts of geometric approximation at the level of velum and epiglottis on French vowels

    Get PDF
    International audienceIn this work we study the effect of the velum and epiglottis on speech production of five French vowels. Our purpose is to examine whether it is possible to simplify the geometry of the vocal tract in the framework of articulatory synthesis to achieve a simpler geometric description without changing the acoustic properties. In the present study, we use MRI to acquire the 3D shape of the vocal tract with simultaneous recording of the speech signal. The geometric two-dimensional shape derived from these data was used as an input of numerical acoustic simulations. The geometrical shape was edited at the level of epiglottis and velum (with or without epiglottis, with or without a constant wall approximation at velum) and the spectra obtained via numerical acoustic simulations were compared with those obtained from audio recordings. This allows the impact of these articulators and geometrical simplifications to be assessed

    Emotion recognition from phoneme-duration information

    Get PDF
    The duration of each phoneme is extracted for several emotions. Information on phonemes and their duration are used to train a Variational AutoEncoder (VAE) to create a latent space z which represents emotion information. The loss functions that were used for that purpose are reconstruction loss, Kullback-Leibler (KL) divergence and multiclass N pair loss. Test samples are classified using the nearest neighbor criterion between their representation and the clusters associated to each emotion, as estimated from training data. To evaluate the models two metrics were used: emotion recognition accuracy and the consistency of the clusters of the latent space

    Synthesize MRI vocal tract data during CV production

    Get PDF
    International audienceA set of rtMR image transformations across time is computed during the production of CV that is afterwards applied to a new speaker in order to synthesize his/her CV pseudo rtMRI data. Synthesized images are compared with the original ones using image cross-correlation. 2 Purpose To be able to enlarge MRI speech corpus by synthesizing data

    A Multimodal Real-Time MRI Articulatory Corpus of French for Speech Research

    Get PDF
    International audienceIn this work we describe the creation of ArtSpeechMRIfr: a real-time as well as static magnetic resonance imaging (rtMRI, 3D MRI) database of the vocal tract. The database contains also processed data: denoised audio, its phonetically aligned annotation, articulatory contours, and vocal tract volume information , which provides a rich resource for speech research. The database is built on data from two male speakers of French. It covers a number of phonetic contexts in the controlled part, as well as spontaneous speech, 3D MRI scans of sustained vocalic articulations, and of the dental casts of the subjects. The corpus for rtMRI consists of 79 synthetic sentences constructed from a phonetized dictionary that makes possible to shorten the duration of acquisitions while keeping a very good coverage of the phonetic contexts which exist in French. The 3D MRI includes acquisitions for 12 French vowels and 10 consonants, each of which was pronounced in several vocalic contexts. Ar-ticulatory contours (tongue, jaw, epiglottis, larynx, velum, lips) as well as 3D volumes were manually drawn for a part of the images

    Vers un modèle dynamique et tridimensionnel d'un locuteur générique pour l'étude des simplifications géométriques du conduit vocal depuis des données d'imageries par résonance magnétiques

    No full text
    In this thesis we used MRI (Magnetic Resonance Imaging) data of the vocal tract to study speech production. The first part consist of the study of the impact that the velum, the epiglottis and the head position has on the phonation of five french vowels. Acoustic simulations were used to compare the formants of the studied cases with the reference in order to measure their impact. For this part of the work, we used 3D static MR (Magnetic Resonance) images. As speech is usually a dynamic phenomenon, a question arose, whether it would be possible to process the 3D data in order to incorporate dynamic information of continuous speech. Therefore the second part presents some algorithms that one can use in order to enhance speech production data. Several image transformations were combined in order to generate estimations of vocal tract shapes which are more informative than the original ones. At this point, we envisaged apart from enhancing speech production data, to create a generic speaker model that could provide enhanced information not for a specific subject, but globally for speech. As a result, we devoted the third part in the investigation of an algorithm that one can use to create a spatiotemporal atlas of the vocal tract which can be used as a reference or standard speaker for speech studies as it is speaker independent. Finally, the last part of the thesis, refers to a selection of open questions of the field that are still left unanswered, some interesting directions that one can expand this thesis and some potential approaches that could help someone move forward towards these directions.Dans cette thèse, nous avons utilisé les données de l’IRM du conduit vocal pour étudier la production de la parole. La première partie consiste en l’étude de l’impact que le vélum, l’épiglotte et la position de la tête a sur la phonation de cinq voyelles françaises. Des simulations acoustiques ont été utilisées pour comparer les formants des cas étudiés avec la référence afin de mesurer leur impact. Pour cette partie du travail, nous avons utilisé des IRM statiques en 3D. Comme la parole est généralement une phénomène dynamique une question s’est posée, à savoir s’il serait possible de traiter les données 3D afin d’incorporer des informations temporelles de la parole continue. Par conséquent, la deuxième partie présente quelques algorithmes que l’on peut utiliser pour améliorer les données de production de la parole. Plusieurs transformations d’images ont été combinées afin de générer des estimations des formes du conduit vocal qui sont plus informatives que les originales. À ce stade, nous avons envisagé, outre l’amélioration des données de production de la parole, de créer un modèle de référence générique qui pourrait fournir des informations améliorées non pas pour un sujet spécifique, mais globalement pour la parole. C’est pourquoi nous avons consacré la troisième partie l’étude d’un algorithme permettant de créer un atlas spatio-temporel de l’appareil vocal qui peut être utilisé comme référence ou standard pour l’étude de la parole car il est indépendant du locuteur. Enfin, la dernière partie de la thèse, fait référence à une sélection de questions ouvertes du domaine qui restent encore sans réponse, quelques pistes intéressantes que l’on peut développer à partir de cette thèse et quelques approches potentielles qui pourraient être envisager afin de répondre à ces questions

    Three-dimensional surface curvature estimation using quadric surface patches

    No full text
    Recent advances in 3D scanning technology have enabled the development of interesting applications of 3D human body modelling and shape analysis, especially in the areas of virtual shopping, custom clothing and sizing surveys for the clothing industry. Most of the current applications have so far been concerned with automatic tape measurement extraction, i.e. simulation of the manual procedure for extracting a set of body measurements by conventional means, such as tapes and callipers. This has been an important advance since it has made it possible to extract in a non-intrusive manner sets of measurements in a few seconds rather than in the usual 40-45 minutes that also involve intrusive physical contact. However, this approach has a number of problems, and also fails to exploit the full potential of 3D imaging technology, as the 3D sets obtained by a scanner are still reduced to a collection of 1D measurements in order to describe body shape. The approach we present is a method for detecting significant geometric features on the 3D body surface. These features (such as ridges and umbilic points) require no a-priori anatomical information and can be used for driving matching and modelling algorithms such as deformable Active Shape Models. The approach is truly hardware-independent and will work on any set of reasonably complete 3D data, whether it is a raw point cloud or a pre-processed, canonical-typ
    corecore